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interest. The filtering technique also suppresses noise and allows us to calculate skew 
and kurtosis when the point process is highly homogeneous. The algorithm can be 
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close to optimal speed-up for the sampling process on the AlphaServer GS320 NUMA 
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1 Introduction 


Modern cosmology, the study o f the larg e scale structure 
and evolution of our universe llPeacoc 3 l200fil . has ad¬ 
vanced to the point where we can now answer some very 
fundamental questions about the distribution of matter 
within our universe. Ever since Einstein postulated 
the th eory of Gen eral Relativity and, together with De 
Sitter lIPaisL showed how it could be applied to 

the universe as a whole, generations of physicists have 
pondered on the question of what is the overall geometry 
of our universe. Within the past few years observations 
of the relic m ic rowav e radiation from the “Big Bang” 
llBennett et al 1 l2nn,‘t have shown that the universe 
exhibits a geometry quite unlike that expected from 
theoretical prejudices alone. 

Although on the largest scales the distribution of 
matter within our universe is both homogeneous and 
isotropic, on smaller scales—less than l/20th the size of 
our visible universe—it is highly inhomogeneous. Even 
though the matter distribution of the universe was ex- 
ceptionally smooth 300,0 00 years after the creation event 
llKolb and Turneil llQQOh . over billions of years the ubiq¬ 
uitous attraction of the gravitational force amplifies the 
minute fluctuations in the early matter distribution into 
the structure we see today. Moreover, the current best 
theories of structure formation suggest that the matter 
distribution we observe is formed in a ‘hierarchical clus¬ 
tering’ manner with the small structures merging to form 
larger ones and so forth i Peacockl l200r)ll . This growth of 
structure is accelerated by an unseen massive ‘dark matter’ 
component in our universe. Although dark matter cannot 
be observed directly, there is sufficient evidence within ob¬ 
servations to conclusively infer its existence. Modifications 
to Newton’s equations, to change gravitational accelera¬ 
tions on large scales, have had limited success, and cannot 
presen tly be cast in a form compatib le with General Rela¬ 
tivity jSanders and McGaughLl2002^ . 

Understanding the distribution of matter within our lo¬ 
cal universe can tell us much about the cosmic structure 
formation process. While on the very largest scales grav¬ 
ity is the dominant force, on smaller scales gas pressure 
forces, from the gaseous inter-galactic (IGM) and inter¬ 
stellar mediums (ISM), can play a significant role. In 
clusters of galaxies, for example, hydrodynamic forces pro¬ 
duced by the IGM lead to a distribution of gas that is held 
close to hydrostatic equilibrium. Indeed, understanding 
the interaction between the ISM and the stars that con¬ 
dense out of it, is currently one of the hottest r esearch 
areas in cosmology I Thacker and CouchmanLl200lll . Since 
if we can understand this process we are much closer to 
being able to infer how the galaxies we observe relate to 
the underlying distribution of dark matter that dominates 
the evolution of structure. 

Although we are yet to absolutely determine the relation 
between galaxies and dark matter, measuring the distribu¬ 
tion of galaxies is the only way of infering the distribution 


of all matter (visible or not). Me asurements of t he speed 
of recession of local galaxies, led iHubbld to form 

the distance-redshift relation now know as ‘Hubble’s Law’, 
which has become a bedrock for the development of cosmo¬ 
logical theory. Although modern surveys of galaxies use an 
updated, and more accurate, form of the distance-redshift 
relation to uncover the spatial distribution of galaxies, the 
principles involved remain the same as those used by Hub¬ 
ble. 

Aided by highly automated observing and computer 
driven data analysis, a new generation of high quality 
galaxy redshift surveys is mapping our loc al Universe 
with exquisite precision. The 2 degree field didfl. I2nn4 
and Sloan Digital Sky Survey lISDSSl. |2004 provide as¬ 
tronomers with a survey of the local universe out to a red- 
shift of z ~ 0.3, and contain over 200,000 and one million 
(when complete) redshifts respectively. In figure^we show 
the distribution of galaxies for the 2dF survey to give an 
visual impression of the type of inhomogeneity observed. 

Traditionally, one of the primary goals of analysis of 
redshift surveys is the calculation of the two point auto¬ 
correlation function (2-pt GF). The large sample volumes 
provided by 2dF and the SDSS have allowed the 2-pt GF 
to be calculated with great accuracy. While the initial con¬ 
ditions produced by the “Big Bang” are widely believed to 
exhibit Gaussian statistics {e.g. Kolb and Turner, 1990), 
the formation of structure by gravitational instability in¬ 
troduces non-Gaussian features into the statistics of the 
matter distribution. Hence, the 2-pt CF cannot be a com¬ 
plete descriptor of the underlying matter distribution at 
late times. Astronomers were aware of this issue com¬ 
paratively early in the development of the field, and the 
theoretical basis for calculating hi gher order stat istics was 
developed through the 1970’s fsee lPeebl^ 1 198nll for a de¬ 
tailed summary). 

Early attempts to measure higher order moments of the 
ma ss distribution , via the counts-in-cells method (again 
see IPeeblesI lll98nll 'l. suffered from inadequate sample size. 
Because higher order moments tend to be progressively 
dominated by the most dense regions in a given sample, 
ensuring that adequate sampling has been performed is of 
utmost importance. Ensuring low sample variance is also 
necessary, and given one sample the only way to check 
this is to analyse sub-samples, which rapidly depletes the 
available information. 

From a theoretical perspective, higher order statistics 
are interesting in relation to gravitational perturbation 
theory and the evolution of non-linear gravitational clus¬ 
tering. Analyses examining the accuracy of numerical sim¬ 
ulation methods often rely upon higher order statistics. 
This is especially important in t he study of gravitational 
cluste ring in ‘scale free’ universes I Gouchman and PeeblesL 
Il998ll . The development of fast, parallel, statistical algo¬ 
rithms is vital to progress in this arena. While the de¬ 
velopment of parallel simulation algorithms has advanced 
forward rapidly {e.g. Thacker et ah, 2003) development of 
parallel analysis tools has lagged behind. This is partially 
due to the fact that the benefits of developing a parallel 






















































Figure 1: Distribution of galaxies in the two main slices from the 2dF galaxy redshift survey. Each point represents a 
galaxy, and they combine to trace filament and wall structures in three dimensions. The geometry of the distribution 
is directly related to the statistical properties of the conditions in our universe following the “Big Bang”. 


analysis code can be shorted lived because the required 
analyses can change rapidly (much faster than the simula¬ 
tion algorithms themselves). The rapid development times 
available on shared memory parallel machines make them 
an ideal complement to large distributed memory machines 
which most simulations are now run on. 

Although throughout this paper we discuss the applica¬ 
tion of our new method to cosmology, it can be applied 
equally well to the statistics of any point process. Indeed 
the terms ‘particle’ and ‘point’ are often used interchange¬ 
ably. The method can also be modified to apply to dif¬ 
ferent dimensions, although in 2 dimensions the gains are 
expected to be less significant due to the reduced amount 
of work in the counts-in-cells method. 

The layout of this paper is as follows: in section „ we 
quickly review the statistics we wish to calculate. This is 
followed by an explicit description of our new algorithm, 
and an examination of its performance. Next we present 
a brief case study on applying our algorithm to cosmology 
and conclude with a brief summary. 


2 Statistics: Moments and Correlation Functions 


Due to space limitations a full discussion of the counts- 
in-cells method, and how it is related to higher order mo¬ 


ments, is beyond the scope of this paper. However an 
excellent discussion of counts-in-cells and sta t istical mea¬ 
surement processes may be found in IPeeblesI lll980l). For 
completeness, we briefly summarize the statistics we are 
interested in measuring. 

The 2-pt CF, ^(r), measures the radial excess/deficit 
over Poisson noise for a point process. It is defined in terms 
of the joint probability, SP, of finding objects in volume 
elements SVi and 5V2 separated by a radial distance ri 2 , 
viz, 

5P = n^{l + i{ri2)Wi5V2, ( 1 ) 

where n is the average number density of the point process. 
The Fourier transform pair of the 2-pt CF is the power 
spectrum, P(|k|), 

P(|k|) = 7 ;^ / ( 2 ) 

which is used to describe the statistics of the initial density 
field in cosmology. 

The joint probability idea can be generalized to n-pt 
processes, for example, the reduced 3-pt CF is defined by; 

5P = n^SViSV2dV3X 


{I + ^{ri) + ^{r2) + ^{ra) + C{ri,r2,r3)), (3) 











where ri,r 2 and are defined by the triangle described 
by the three points under consideration. For cosmology, 
the assumptions of homogeneity and isotropy require that 
^ 2 , be a symmetric function of these three lengths. 
Higher order correlation functions follow in a logical man¬ 
ner. 

Using the counts-in-cells method, it can be shown that 
the second central moment ^2 = {{N — nU)^), where N is 
the count of points within spheres of radius r (and volume 
U), is given by 

^2 = nV + n^ [ dVidV2^{ri2). (4) 

Jv 

The third central moment /ra = {{N — nV)^), is given by 


through very dense cells in a grid code). To counter this 
problem one can use a hierarchical ( tree) storage o f count s 
in cells on a grid, as discussed in ISzanudi et al ] 

This greatly improves calculation time, since the summa¬ 
tion over particles within cells is much reduced at large 
radii. Using this method it has been reported that 10® 
samples from a data set with 47 million particles can be 
generated in 8 CPU hours. 

The basis of our alternative ‘smooth field algorithm’ is 
that each counts-in-cells value is a discrete sample of the 
local density field smoothed over the scale of the sample 
sphere. In the continuum limit of an infinite number of par¬ 
ticles, defining the density 5(x), the sampled value Ss{x) 
can be written as an integral over the spherical top-hat 
function W(r,rf) 


At 3 = 3^l2 - 2nV + n^ [ dVidV2dV3C (5) 

Jv 

Both these equations show how integrals over the correla¬ 
tion functions enter in to calculations of the central mo¬ 
ments. Relationships for the higher order moments can be 
constructed, but rapidly become lengthy to calculate {e.g. 
Fry and Peebles, 1978). 

The final definition we require is one that relates higher 
order cumulants to the variance. To aid our discussion 
we introduce the following notation: the over-density of a 
point process relative to the mean density, p, is given by 
(5(x) = Ap(x)/p where Ap = p{x) — pis the local deviation 
from the average density. Although this is most usually 
recognized as a continuum description, it also provides a 
useful construct for our discussion of point processes. For 
example, since the local density of particles in the counts- 
in-cells method is given by N/V, S(x) ~ {Af/V — n)/n. 
From this definition of S{x) the n-th order connected mo¬ 
ments of the point process define the ‘Sp’ statistics via the 
following definition^: 


(< 5 ") = (< 5 ^) 


p-1 


( 6 ) 


The Sp statistics play a central role in analysis of red- 
shift survey s. To date, up to S q has been calculated by 


researchers ijSzapudi et al. 


p to *9 

lEi^. 


3 The Smoothed Field Algorithm (SFA) 


While the counts-in-cells method is conceptually beau¬ 
tiful in its relation to the Sp statistics, it is computation¬ 
ally strenuous to calculate. As the radius of the sampling 
sphere becomes larger, on average the work to calculate 
the count within the sphere will grow at a cubic rate. In 
reality the situation can be potentially worse, since inef¬ 
ficiencies in particle book-keeping can appear (i.e. having 
to search far down tree-nodes, or equivalently searching 

^The Sp statistics are motivated by the assumption that, given 
the 2-pt CF, 5(r) = (ro/r)^, the n-pt correlation functions scale as 
(Axi,..., Aa;„) = A (^xi,Xn), see Balian and Scha¬ 

effer (1989). 


W{r,rf) = 1^’ 
of radius r/ and the raw density field 5(x), to give, 

/ S{x + r)W{\r\,rf)d^r, (8) 

VTH Jv 

where V is the volume of the periodic sample region and 
Vth the volume of the sample sphere (a 3 dimensional top- 
hat). Via the Convolution Theorem, the Fourier transform 
of Ss{x), namely, (5s(k) is given by 

,5,(k)=5(k)W((|k|,r/). (9) 

Thus we can quickly calculate the entire 6s(x) field by 
Fourier methods. 

The discrete calculation of counts can be expressed in 
almost the same way, except that the continuous density 
field is replaced by a discrete sum of three dimensional 
Dirac delta functions, S^{x), 

1 r 

Ns{x) = —— / 6^{x + r-Xt)W{\r\,rf)d^r (10) 

Vth Jv ^ 

where Np is the number of particles in the simulation, and 
Xi gives the position of particle i. In the counts-in-cells 
method the integral over the volume is replaced by a sum¬ 
mation within the given search volume Vth- 

To connect these two approaches all that is needed is 
a smoothing function that will convert a discrete set of 
points to a continuous density field. We require a smooth¬ 
ing function, A(x), which can be summed over the particle 
positions to reproduce a smooth field S(x). Provided we 
can do this, we can use Fourier methods to precalculate all 
of the required 6s (x) values and greatly reduce the amount 
of work. In practice it will be necessary to define a discrete 
density on a grid, and then use an interpolation process 
to provide a continuum limit. The smoothing idea has 
been studied in great depth (see iHocknev and EastwoodI 
lll988l) for explicit details) and there exists a series of com¬ 
putationally efficient smoothing strategies that have good 
Fourier space properties, as well as having well defined 


r < r/; 
r > rf, 


(7) 


















interpolation function pairs. The most common smooth¬ 
ing function (‘assignment function’) mechanisms are ‘CIC’ 
(Cloud-in-Cell), and ‘TSC’ (Triangular Shaped Cloud). 
Cloud-in-cell interpolation provides a continuous piece- 
wise linear density field, while TSC has a continuous value 
and first derivative. The only potential issue of difficulty 
is that sampling a continuous periodic variable at discrete 
points means that the Fourier domain is finite and periodic 
and thus has the possibility of being polluted by aliased in¬ 
formation (with images separated by 27r/L where L is the 
size of the period). In practice, the higher order assign¬ 
ment functions have a sufficiently sharp cut-off in Fourier 
space that this is not a significant problem^. 

Having established that we can convert our discrete set 
of points into a continuous density defined by a grid of val¬ 
ues and an interpolation function, we must decide upon the 
size of grid to be used. The initial configuration of points 
(corresponding to a low amplitude power spectrum) is 
such that the majority of neighbouring particles have sep- 
arations close to the mean inter-particle separation Np . 
Therefore, for this configuration we use a grid defined such 
that = Np. This is beneficial on two counts: firstly, the 
grid requires a comparatively small amount of memory to 
store than the particle data, and secondly, it captures al¬ 
most all the density information stored in the particle dis¬ 
tribution (since most particles are separated by sizes close 
to the grid spacing). 

To summarize, the steps in the SFA are as follows: 


1988) which is defined (in 1-dimension) by; 


Ai{x) 


f I-h |a;|2 (M - l) , 1x1 < 1; 

h(2-|x|)3, 1<N<2; 

[O, |x|>2. 


( 11 ) 


and the 3-dimensional function is defined A(x, y, z) = 
Ai{x)Ai{y)Ai{z). Note that A{x,y,z) is not an isotropic 
function, which in this case is beneficial for speed, since it 
is unnecessary to calculate a square root. It also simplifies 
calculating the Fourier transform of the assignment func¬ 
tion since all the dimensions are now separable. Note that 
Ai(x) has a comparatively wide smoothing profile, and 
therefore its Fourier transform is a strongly peaked func¬ 
tion with good band-limiting properties. This is advan¬ 
tageous for dealing with the aliasing problem mentioned 
earlier. Indeed, the Fourier transform of Ai{x) is: 


i(fc.) 


/ sin(fca^/2) \ 

V fcx /2 J 


( 12 ) 


which has a 1/fc'^ suppression of power. This is sufficiently 
sharp to ensure that only the first and second images need 
be accounted for in G{k) (the Green’s function associated 
with top-hat filtering and the assignment process). 


4 Performance Comparison 


1. Use an assignment function, A(x), to smooth the mass 
(m) associated with each of the particles on to a grid. 
This creates the grid representation of the density 
field, p(x): 

Np 

p(x) = ^(^* “ 

2. Fourier transform the density field p(x) to form 5(k) 

3. Multiply by G{k), the product of the Fourier trans¬ 
form of the real space top-hat filter {W{k,rf) = 
3(sin(fcrj) — kvf cos{krf))/kvj) and the inverse of the 
assignment function filter, which includes an alias sum 
out to two images 

4. Fourier transform the resulting field back to real space 

5. Calculate 5s(x) at all sampling positions using the 
interpolation function pair to the original assignment 
function 4l(x) 

6 . Calculate desired statistics 

In this paper we have used a 3rd order polynomial as¬ 
signment function (‘PQS’, see Hockney and Eastwood, 

^ See I Hockney and EastwoodI for a discussion of this point. 

Aliases can only be removed completely by assigning information to 
all points on the sampling grid for each point/particle, which it too 
computationally expensive to be feasible. 


Before proceeding to parallelize the algorithm, it is 
instructive to compare the speed of the serial algorithm 
as compared to the counts-in-cells method. In figure [3 we 
show the time to calculate 2.1 x 10® samples on 2.6 x 10® 
points as a function of the sample radius. A (logarithmic) 
least-squares fit showed that the time for the standard 
counts-in-cells method (version 1) grows as r^ ®, which 
is slightly lower than the expected value of r®. For the 
second counts-in-cells algorithm we developed, which 
is optimized by storing a list of counts in the chaining 
cells used to control particle book-keeping in the code, 
the dependence with radius was found to be r^. This is 
understood from the perspective that most of the work in 
each sample has already been performed in the summation 
within chaining cells and that the work for each sample 
thus becomes dependent on sorting over the cells at the 
surface of the sample area, which is proportional to r^. 
However comparison of both these methods to the SFA 
shows they are far slower in comparison. Because the en¬ 
tire (5s (x) field is precalculated (modulo the interpolation 
process to non-grid positions) in the SFA method, the 
time to calculate the samples is constant as a function 
of radius, and is exceptionally fast. Based up the data 
presented in figure |3 we initially estimated being able 
to calculate 10® sample points on a 512® data set in 
less than 2 CPU hours, which is over 4x faster than the 
results reported for tr ee-optimized counts-in-cells methods 
( Szapudi et al 1 I 1999 II . We have recently confirmed this 
result using our parallel code, which took 6.5 minutes on 
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Figure 2: Comparison of speed for two versions of the 
counts-in-cells (CIC) method versus SFA for 2.6 x 10® par¬ 
ticles and 2.1 x 10® sample points at different smooth¬ 
ing radii. Least squares fits are given for all data. The 
first counts-in-cells method is a straight summation over 
particles contained within the sampling sphere, while the 
second method is optimized to store a count of particles 
within chaining cells. Provided a chaining cell lies within 
the radius of the sampling sphere then the sum within the 
chaining cell is not necessary. 

32 processors to calculate 10® samples on a 512® particle 
data set produced for a project being conducted at the 
Pittsburgh Supercomputing Center. 


5 Parallelization 


Typically when calculating statistics, the value of the 
sampling radius (equivalently the top-hat radius) is varied 
so that the entire sampling process must be repeated many 
times. Thus the most obvious method of parallelization is 
to create several different grids for each smoothing radius 
and process them in parallel. However, available memory 
considerations may well make this impractical. Instead, 
it is better to parallelize each calculation for each radius. 
This is non-trivial as the following algorithmic steps must 
be parallelized: 

1. Calculation of Green’s function 

2. Forward FFT of density grid to /c-space 

3. Multiplication of density grid by Green’s function 

4. Reverse FFT to real space 

5. Sum over sample points 

The first four items have all been parallelized previously for 
our main simulation code (see Thacker et ah, 1998). The fi¬ 
nal step, while appearing to be somewhat straightforward, 
must be approached with care (as we shall demonstrate). 


Figure 3: Comparison of speed-ups for different implemen¬ 
tations of the SFA. Version 1(a) is the standard method 
with block data decomposition but no directed assignment 
within RADs. Version 1(b) forces data and threads to 
spread across RADs. Version 2 corresponds to our data 
local sampling and is clearly superior. 

The obvious issues which need to addressed are (1) en¬ 
suring each thread has a different random seed for sample 
positions and (2) that the sum reduction of the final values 
across threads is performed. In practice, both of these is¬ 
sues can be dealt with in very straightforward ways using 
the OpenMP shared memory programming standard. Sum 
reductions can be controlled via the REDUCTION primitive 
while different random seeds can be set using an array of 
initial values. Parallelization in this environment turned 
out to be straightforward. 

Tests on a 32 processor HP GS320 (1 GHz Alpha EV6/7 
processors) at the Ganadian Institute for Theoretical As¬ 
trophysics (CITA), showed reasonable speed-up (see figure 
0 ), but comparatively poor efficiency (22%) when 32 pro¬ 
cessors were used. There is also a noticeable step in the 
speed-up at 4 to 8 processors. This step is caused by mem¬ 
ory for a job being moved to a second memory domain, or 
‘resource affinity domain’ (RAD), within the machine. The 
32 processor machine has 8 RADs in total, connected via 
a cross-bar, with 4 processors belonging to each RAD. La¬ 
tency to remote RADs is significantly higher than to local 
RADs, which explains the increased execution time. Addi¬ 
tionally, as the amount of traffic on the cross-bar between 
the RADs increases, latencies are known to increase by 
very large factors (up to 3000 nanoseconds, Gvetanovic, 
2003). This is a serious bottleneck in the GS320 design 
which has been removed in the latest GS1280 machine. 
Ultimately, to improve performance on the GS320, it is 
necessary to increase the locality of the sampling technique 
to reflect the locality of memory within the machine, and 
avoid sending data across the cross-bar. 

Note that using a block decomposition of data across 
the RADs means that locality is only really necessary in 
one axis direction. Therefore, we adopted the following 










Figure 4: Initial and final point configurations for a slice through a simulation with 2.6 x 10® particles. 


strategy to improve performance: 

1. Block decomposition of the Ss(x) grid across RADs 

2. Pre-calculate the list of random positions in the z-axis 

3. Parallel sort the list of random positions in increasing 
z value 

4. Parallelize over the list of z positions, calculating x 
and y values randomly 

The resulting sample still exhibits Poisson noise statistics 
and is therefore valid for our purposes. However, the sam¬ 
ple points are now local in the z direction, which greatly 
reduces the possibility of remote access due to the block 
assignment of data. The scaling improvement for this 
method is shown in Hgure|3 The improvement is striking. 
We achieved a 1.2x increase in performance for the sin¬ 
gle processor result alone, while at 32 processors we have 
achieved a 4.8 x improvement in speed-up and a tripling 
of the parallel efficiency (82%). Note that the speed-up is 
still not perfect for the improved version. This may be a 
bandwidth issue since the interpolation at each sampling 
point requires 64 grid values, which breaks down into 16 
cache lines, with only 8 floating point calculations per¬ 
formed for all the data in each cache line. Note that it is 
unlikely that using the next lowest level of interpolation 
(TSC) would help. TSC requires 27 points grid points per 
sample, which is 9 cache lines, with 6 floating point calcu¬ 
lations per cache-line. Thus the overall ratio of calculation 
to memory fetches is actually reduced. 


6 Application: Moments in Initial Conditions 


The initial conditions for cosmological structure are 
prescribed by initial density, temperature and velocity 
helds. Although there is debate over whether evolution in 
the early universe (such as magnetic fields) may induce a 
non-G aussian signal in the initial conditions l|White et a,1.L 
Em, most researchers believe that the density field 
is Gaussian process, and the velocity may be derived 
directly from it. In the absence of non-Gaussian features, 
the density held, which is usually discussed in terms of 
the linear over-density S, is completely described by its 
continuous power spectrum P(k) = Ak", where A is a 
normalization constant. This initially smooth held evolves 
under gravity to produce the locally inhomogeneous and 
biased distribution of galaxies we observe today (see hgure 
^ which compare particles positions from initial to hnal 
outputs). Early evolution, when (5(x) 1, is in the linear 

regime and can be described by perturbation theory. 
As the over-density values approach and later exceed 
unity, it is necessary to use simulations to calculate the 
non-linear evolution. Thus, ideally, the initial conditions 
for simulations should correspond to the latest time that 
can be followed accurately by perturbation theory. 

IScoccimarrol lll998l) has developed an algorithm for the 
fast calculation of the particle positions required for cos¬ 
mological simulations via 2nd order Lagrangian perturba¬ 
tion theory (2LPT). We have recently implemented this 
algorithm in parallel using OpenMP. Although 2LPT re¬ 
quires more computation, it has significant advantages over 
the standard 1st order technique (known as the Zel’dovich 
(1968) approximation) as higher order moments exhibit far 
less transient deviations at the beginning of the simulation. 
Further, one should in principle be able to follow the initial 
evolution to slightly later epochs using 2LPT and therefore 


















































































begin simulations at a slightly later time. In practice, the 
transient deviation issue is most significant. 

In general, the more negative the spectral index the 
faster the initial transients die away. This is helpful, since 
most simulations are conducted with an effective spectral 
index, n, of between -1.5 to -3 (depending on the size of the 
simulation volume). Also, although we have focused solely 
on particle position statistics in this paper, it is worth 
noting that a similar analysis can be applied to velocity 
fields dehned on the point process. Analysis of the tran¬ 
sients in the velocity divergence held, 9 = V.u, shows an 
even greater i mprov ement when using the 2LPT method 
llScoccimarrol Il99^ . 


To test whether our new 2LPT code was reproducing the 
correct results we have compared the measured S 3 statis¬ 
tics for our 2LPT initial conditions versus those produces 
with the Zel’dovich approximation (1st order). At the ini¬ 
tial expansio n factor of a =1, th e ZA predicts the following 
value for S 3 l Bernarde ^ ll994 : 


00 

^3 = y-(3 + n), (13) 

while 2LPT predicts; 

S '3 = y - (3-f n). (14) 


Thus after performing the 2nd order correction the value 
of S 3 should increase by 6/7. In hgureElwe show the cal¬ 
culated values of S 3 for two sets of initial conditions, one 
created using the ZA and the other with the additional 
2LPT correction. Both the SPA measured values of S 3 are 
high for this particular set of phases (as compared to the 
theoretical prediction), but we have confirmed that alter¬ 
native random seeds can produce similar results. Indeed 
we have found the values of S 3 are quite dependent upon 
the phases of the Fourier waves used, and achieving a value 
that is asymptotic to the theoretical value is extremely dif¬ 
ficult. We are currently investigating this phenomenon in 
more detail. However, a brief visual inspection of figure 
El provides evidence that the residual. A, between the ZA 
and 2LPT results is close to 6/7 ~ 0.86. Analysis of the 
set of residuals between the two lines gives A = 0.88±0.03 
(Icr deviation), confirming that our code is accurately re¬ 
producing the difference in S 3 values. 


7 Summary and Discussion 


We have presented a new fast algorithm for rapid 
calculation of one point cumulants for point pro¬ 
cesses. Our algorithm is based upon a smoothed 
field approach, which reproduces the underlying sta¬ 
tistical properties of the point processes field from 
which it is derived. The method is significantly faster 
than counts-in-cells methods because the overhead of 
evaluating the number of particles in a given sphere has 
been removed. We are able to calculate 10® sample points 



Radius (Number of average interparticle spacings) 


Figure 5: Comparison of S 3 calculated via CIC versus SFA 
on an n =0 initial condition, with the theoretical result 
shown for reference. SFA shows a good match out to 4 
inter-particle spacings at which point it begins declining. 
CIC appears accurate on small scales but rapidly diverges 
away from the true signal. We have confirmed that as the 
simulation evolves, and the effect of shot noise is reduced, 
both methods converge to similar values. 


on a 512^ data set in less than 2 CPU hours, which is 
over 4x faster than the re sults reported f or tre e-optimized 
counts-in-cells methods I Szanudi et al llHi. We also 
note that while tree methods also lead to very large speed 
ups, they are still subject to noise from the point process 
for low amplitude signals. 

We are currently applying this new technique to exam¬ 
ine the evolution of high order moments in cosmological 
density fields at low amplitude levels and will present our 
findings elsewhere (Thacker, Couchman and Scoccimarro 
in prep). We also anticipate making the codes described 
in this paper publically available in the near future. 
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